Gene duplication (or chromosomal duplication or gene amplification) is any duplication of a region of DNA that contains a gene; it may occur as an error in homologous recombination, a retrotransposition event, or duplication of an entire chromosome.[1] The second copy of the gene is often free from selective pressure — that is, mutations of it have no deleterious effects to its host organism. Thus it accumulates mutations faster than a functional single-copy gene, over generations of organisms.
A duplication is the opposite of a deletion. Duplications arise from an event termed unequal crossing-over that occurs during meiosis between misaligned homologous chromosomes. The chance of this happening is a function of the degree of sharing of repetitive elements between two chromosomes. The product of this recombination are a duplication at the site of the exchange and a reciprocal deletion.[2]
Contents |
Gene duplication is believed to play a major role in evolution; this stance has been held by members of the scientific community for over 100 years.[3] Susumu Ohno was one of the most famous developers of this theory in his classic book Evolution by gene duplication (1970).[4] Ohno argued that gene duplication is the most important evolutionary force since the emergence of the universal common ancestor.[5] Major genome duplication events are not uncommon. It is believed that the entire yeast genome underwent duplication about 100 million years ago.[6] Plants are the most prolific genome duplicators. For example, wheat is hexaploid (a kind of polyploid), meaning that it has six copies of its genome.
The duplication of a gene results in an additional copy that is free from selective pressure. One kind of view is that this allows the new copy of the gene to mutate without deleterious consequence to the organism. This freedom from consequences allows for the mutation of novel genes that could potentially increase the fitness of the organism or code for a new function. An example of this is the apparent mutation of a duplicated digestive gene in a family of ice fish into an antifreeze gene.
Another view is that both copies are equally free to accumulate degenerative mutations, so long as any defects are complemented by the other copy. This leads to a neutral "subfunctionalization" or DDC (duplication-degeneration-complementation) model,[7][8] in which the functionality of the original gene is distributed among the two copies.
The two genes that exist after a gene duplication event are called paralogs and usually code for proteins with a similar function and/or structure. By contrast, orthologous genes are ones which code for proteins with similar functions but exist in different species, and are created from a speciation event. (See Homology of sequences in genetics).
It is important (but often difficult) to differentiate between paralogs and orthologs in biological research. Experiments on human gene function can often be carried out on other species if a homolog to a human gene can be found in the genome of that species, but only if the homolog is orthologous. If they are paralogs and resulted from a gene duplication event, their functions are likely to be too different.
The paralogous segments can be repeat sequences with more than 90% sequence similarity. In such cases, they are known as low copy repeats (LCRs) though they are not highly repetitive sequences. They are mostly found in pericentronomic, subtelomeric and interstitial regions of a chromosome. The LCRs, due to their size (>1Kb), similarity, and orientation, are highly susceptible to duplications and deletions. These genomic rearrangements are caused by the mechanism of non-allelic homologous recombination. The resulting genomic variation leads to gene dosage dependent neurological disorders such as Rett-like syndrome and Pelizaeus-Merzbacher disease.[9]
Gene duplication doesn't necessarily constitute a lasting change in a species' genome. In fact, such changes often don't last past the initial host organism. From the perspective of molecular genetics, amplification is one of many ways in which a gene can be overexpressed. Genetic amplification can occur artificially, as with the use of the polymerase chain reaction technique to amplify short strands of DNA in vitro using enzymes, or it can occur naturally, as described above. If it's a natural duplication, it can still take place in a somatic cell, rather than a germline cell (which would be necessary for a lasting evolutionary change).
Technologies such as genomic microarrays, also called array comparative genomic hybridization (array CGH), are used to detect chromosomal abnormalities, such as microduplications, in a high throughput fashion from genomic DNA samples. In particular, DNA microarray technology can simultaneously monitor the expression levels of thousands of genes across many treatments or experimental conditions, greatly facilitating the evolutionary studies of gene regulation after gene duplication or speciation.[10][11]
Duplications of oncogenes are a common cause of many types of cancer, as is the case with P70-S6 Kinase 1 amplification and breast cancer.[12] In such cases the genetic duplication occurs in a somatic cell and affects only the genome of the cancer cells themselves, not the entire organism, much less any subsequent offspring.
Cancer type | Associated gene amplifications |
Prevalence of amplification in cancer type (percent) |
---|---|---|
Breast cancer | MYC | 20[13] |
ERBB2 (EGFR) | 20[13] | |
CCND1 (Cyclin D1) | 15-20[13] | |
FGFR1 | 12[13] | |
FGFR2 | 12[13] | |
Cervical cancer | MYC | 25-50[13] |
ERBB2 | 20[13] | |
Colorectal cancer | HRAS | 30[13] |
KRAS | 20[13] | |
MYB | 15-20[13] | |
Esophageal cancer | MYC | 40[13] |
CCND1 | 25[13] | |
MDM2 | 13[13] | |
Gastric cancer | CCNE (Cyclin E) | 15[13] |
KRAS | 10[13] | |
MET | 10[13] | |
Glioblastoma | ERBB1 (EGFR) | 33-50[13] |
CDK4 | 15[13] | |
Head and neck cancer | CCND1 | 50[13] |
ERBB1 | 10[13] | |
MYC | 7-10[13] | |
Hepatocellular cancer | CCND1 | 13[13] |
Neuroblastoma | MYCN | 20-25[13] |
Ovarian cancer | MYC | 20-30[13] |
ERBB2 | 15-30[13] | |
AKT2 | 12[13] | |
Sarcoma | MDM2 | 10-30[13] |
CDK4 | 10[13] | |
Small cell lung cancer | MYC | 15-20[13] |
|